2. Random variable, stochastic process and simulation.

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import scipy as sp

# We will be using random numbers from numpy random library
from numpy.random import normal, choice, uniform


import holoviews as hv
from holoviews import opts
hv.notebook_extension("plotly")

2.1. Random numbers, probability distribution and infernece libraries.

  • The numpy.random has the fastest random number generators that are based on low level code written in C.

  • The Scipy.stats has an extensive library of statistical distributions and tools for statistical analysis.

  • The Statsmodels Enhancing Scipy functionality with more tools for stat snslysis

  • The Seaborn library that enhanced matplotlib functionality for easy stat visualization.

PyMC3 and PyStan powerhouse libaries for Probabilistic modeling, MCMC and Bayesian inference.

2.2. General overview of random numbers in python

First we take a look at most widely used random numbers of numpy also called standard random numbers. These are rand (uniform random number on interval 0,1) and randn (stnadard normal random number with 0 mean and 1 variance). Useful Tip: when writing any code withr andom numbers is pay attention to the seed. If you want to test your code use the same seed to check for reproducability.

#np.random.seed(8376743) 
np.random.rand(10) # Generates standard uniform random numbers U(0,1)
array([0.09594303, 0.55568171, 0.66827622, 0.5508196 , 0.96480721,
       0.96316764, 0.41464708, 0.28530747, 0.46608068, 0.56021157])
#plt.plot(np.random.rand(1000), 'o', color='green', alpha=0.5)
plt.hist(np.random.rand(10000),20)
plt.ylabel('Counts')
plt.xlabel('Bins')
Text(0.5, 0, 'Bins')
../_images/02_RandVar_6_1.png
#np.random.seed(18493) 
np.random.randn(10) # Generates standard random numbers N(0,1)
array([ 0.91577119,  1.17172968, -1.1980772 , -0.87322399,  0.84957424,
        0.6253262 , -0.51120252,  1.37219998,  0.42012106, -0.23660843])
#plt.plot(np.random.randn(1000), 'o', alpha=0.5)

plt.hist(np.random.randn(1000))
(array([  4.,  12.,  61., 122., 211., 244., 173., 116.,  41.,  16.]),
 array([-3.36530532, -2.75305713, -2.14080895, -1.52856077, -0.91631259,
        -0.30406441,  0.30818377,  0.92043196,  1.53268014,  2.14492832,
         2.7571765 ]),
 <BarContainer object of 10 artists>)
../_images/02_RandVar_8_1.png

Parametraized random number allow you to set parameters like value of mean and interval length and therefore can be viewed as generalized versions of standard random numbers. Below we take a look at examples of continuous (random and unofrm RVs) and discrete random numbers generated by paratmerized distributions.

np.random.uniform(low=-1, high=1, size=(3, 4))
array([[-0.73715826, -0.4008936 ,  0.51212353,  0.68102689],
       [ 0.17995235,  0.05453276, -0.37258262,  0.35454684],
       [-0.16724103,  0.04110811,  0.52638791, -0.80142544]])
np.random.normal(loc=8, scale=10, size=(4, 4))
array([[  7.19416803,  16.29405381,  13.09412104, -10.76132091],
       [ 13.02596898,  17.56190323,  25.9512467 ,   5.398422  ],
       [ 20.1037163 ,  -2.0630515 ,   3.50148201,  17.49678852],
       [ 16.41672446,  -2.53257741,   1.3523871 ,  14.64985089]])
np.random.binomial(n=10, p=0.6, size=20) # This one is Binomial distributions. You can see it is discrete as  exepcted. 
array([4, 6, 9, 5, 8, 2, 6, 7, 6, 7, 4, 6, 8, 7, 7, 7, 6, 5, 7, 6])

2.2.1. Using random numbers to get answers via simulations

One of the major uses of random numbers is for conducting numerical simulations. What is a simulation? It is a recreation of a process on a computer. And this recreation is done by random numbers. E.g to simulate coint tosses, die throws, diffusion of molecules, conformational change of polymers we use random number to recreate the process on a computer. Let’s start off by asking some simple questions

  • How often do we get a run of 5 or more consecutive heads in 100 coin tosses if we repeat the experiment 1000 times?

  • What if the coin is biased to generate heads only 40% of the time?

L = 100    # length of each trajectory
N = 1      # number of experiments: stochastic trajecotries generated

xs = np.random.choice([0,1], (L, N)) # (i) Unbiased coin p=[0.5,0.5] by default

ys = np.random.choice([0,1], (L, N), p=[0.9, 0.1])  # (ii) biased coin
fig, ax  = plt.subplots(ncols=2)

ax[0].plot(xs,'o',alpha=0.5,label  =  'data1')
ax[0].plot(ys,'o',alpha=0.5, label =  'data2')

ax[1].hist((xs[:,0], ys[:,0]), 2, label = ("data1", "data2"))

ax[0].legend()
ax[1].legend()
<matplotlib.legend.Legend at 0x7f15a5ec8a10>
../_images/02_RandVar_16_1.png
np.random.seed(12738)

# Non-fancy way of simulating the answer
runs = 0
for x in xs:    # plug either xs or ys
    m = 0
    for i in x:
        if i == 1:
            m += 1
            if m >=5:
                runs += 1
                break
        else:
            m = 0
runs
0

2.2.2. Simple, discrete, unbiased random walk in 1D.

Let’s start off by simulating a random walk outlined in class by using a random number generators from numpy called choice and normal. Type help(np.random.choice) and help(np.random.normal) to learn more.

numpy.random.choice(A,size=(M,N,K)) 

returns element from a list of objects, A with uniform and equal probabilities (default) or with unqeual probabilities once supplied a list. Returend array can be a 1D, 2D or 3D aray of shape M,N,K.

choice(['bagel','muffin','croissant'], size=(3,5))
array([['bagel', 'croissant', 'muffin', 'muffin', 'croissant'],
       ['muffin', 'croissant', 'bagel', 'croissant', 'bagel'],
       ['croissant', 'croissant', 'muffin', 'croissant', 'bagel']],
      dtype='<U9')
  • The numpy.random.normal(mu,sigma,size=(M,N,K)) retruns random number distributed according to gausian probability function in the form of 1D, 2D or 3D arays of length N,M,K

# Here we generate 3 sequences of normally distributed random variables of length 5
normal(0,2,(3,5))
array([[-2.69161384, -1.76025845,  2.6359338 , -1.27629114,  0.28905087],
       [ 1.10431222, -2.35128534,  1.62603244, -0.08737834,  0.95731216],
       [-0.47182587, -0.10790324,  0.85141111,  0.9185157 , -0.58355616]])

2.2.3. Simulating a 1D unbiased random walk

In the course of simulating random walks we will be genreating multidimensional numpy arrays. We will adhere to a convnetion that:

  • Rows are regarded as number of measurments, or samples

  • Columns are regarded as number of observables distinct measruments/trajectories

def rw_1d(L, N=1):
    
    '''1d random walk function:
    L: trajectory length
    N: Number of trajecotry
    '''
    rw = choice([-1,1], size=(L, N))
    
    rw_sum=rw.cumsum(axis=0)

    rw_sum[0,:]=0 
    
    return rw_sum
rw_sum = rw_1d(1000, 100)
plt.plot(rw_sum, alpha=0.7)  


plt.title(f'{N} random walks', fontsize=20)
plt.xlabel(r'$Steps$',fontsize=20)
plt.ylabel(r'$Position$',fontsize=20)
plt.grid('--')
../_images/02_RandVar_27_0.png
import pandas as pd
rw_data = pd.DataFrame(rw_sum)
rw_data
0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 2 2 0 0 0 0 0 0 0 ... -2 -2 0 2 0 0 0 0 0 -2
2 -1 3 3 1 1 -1 1 -1 1 1 ... -1 -3 1 3 -1 -1 1 1 -1 -3
3 0 2 4 0 2 -2 0 0 2 2 ... 0 -4 2 4 -2 0 0 2 -2 -4
4 1 1 5 -1 3 -3 1 -1 1 1 ... -1 -5 1 5 -1 -1 1 3 -3 -3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 20 34 44 -26 0 -52 8 40 30 -24 ... 50 2 20 -36 26 56 30 -16 -6 12
996 21 33 43 -27 -1 -53 9 41 31 -25 ... 51 1 19 -37 25 55 31 -17 -7 13
997 22 34 42 -26 0 -52 10 42 30 -26 ... 52 0 18 -38 24 56 30 -18 -6 12
998 21 35 43 -27 1 -53 11 41 31 -27 ... 51 1 17 -37 25 55 29 -17 -5 11
999 20 36 44 -26 0 -52 10 40 32 -28 ... 52 0 18 -38 26 56 30 -16 -6 10

1000 rows × 100 columns

plot = hv.Curve(rw_data[0].values)

for i in range(1,50):
    plot*=hv.Curve(rw_data[i].values)

plot #+ hv.Histogram(np.histogram(rw_data[4][500:]))

2.2.4. Statistics of random walk

from scipy.stats import norm # we need this to fit to normal distribution
fig, ax = plt.subplots(1, 5, figsize=(9, 3), sharey=True) 

L=100
N=1000
rw_sum = rw_1d(L,N)

for i in range(5):  
    
    t = int(i*(L/5)) + 5                   # record dist at 5 equidist points
    
    ax[i].hist(rw_sum[t,:], density=True)  # generate histogram at different points
    
    # Fit histogram to normal dist 
    xmin, xmax = ax[i].get_xlim()
    x = np.linspace(xmin, xmax, 100)
    
    y = norm.pdf(x, 0, np.sqrt(t))
    
    ax[i].plot(x,y)  
    
    ax[i].set_title(f"t = {t}")
../_images/02_RandVar_32_0.png

2.2.5. Connection with diffusion: Mean square displacement

# Determine the time evolution of the mean square distance.
plt.plot(np.mean(rw_sum**2, axis=1), '-o') # why only rw**2 and not mean(rw-mean(rw))**2 ? 

plt.title('Compute mean square deviation of 1D random walker',fontsize=15)
plt.xlabel('Number of steps, n',fontsize=15)
plt.ylabel(r'$MSD(n)=\langle x(n)^2 \rangle$',fontsize=15);
../_images/02_RandVar_34_0.png
# Questions: 
# (i)  Compute MSD for @D random walk show the expected scaling 
# (ii) Generate random walks from different positions
# (iii) choice between [-1,1] 
# (iv)  Multinomial choice
# (v)  Self avoiding walks
def rw_2d(L, N=1):
    
    '''2d random walk function:
    L: trajectory length
    N: Number of trajecotry
    '''
    verteces = np.array([(1,0),(0,1),(-1,0),(0,-1)])
    
    rw       = verteces[choice([0,1,2,3], size=(L, N))]
    
    rw[0, :, :] = 0
    
    return rw.cumsum(axis=0)
verteces = np.array([(1,0),(0,1),(-1,0),(0,-1)])
    
rw       = verteces[choice([0,1,2,3], size=(L, N))]
%%time
rw_sum = rw_2d(1000,5)
rw_sum.shape
CPU times: user 861 µs, sys: 65 µs, total: 926 µs
Wall time: 679 µs
(1000, 5, 2)
plt.plot(rw_sum[:,:,0], rw_sum[:,:,1])
plt.plot('')
[<matplotlib.lines.Line2D at 0x7f15a4f8a490>]
../_images/02_RandVar_39_1.png

2.2.6. Binomial distribution of a random walker

  • In previous example we started with random variables with no reference to probability distribution. This time we will generate random numbers from a binomial distribution.

  • This time we will use scipy.stats library which contains probability distirbution functions. That is in addition to generating random variables we can also compute probability distributions and related quantities analytically.

from scipy.stats import binom, norm, poisson  
s =  binom(10, 0.5) # Let us declare s to be a binomial RV
print(s.rvs(20))          # 20 random samples form X
print(s.pmf(5))           # P(X = 5)
print(s.cdf(5))           # P(X <= 5)
print(s.mean())           # E[X], mean
print(s.var())            # Var(X), variance
print(s.std())            # Std(X), standard deviation
[6 6 6 4 6 7 6 8 3 4 2 2 4 7 6 7 4 5 4 3]
0.24609375000000025
0.6230468749999999
5.0
2.5
1.5811388300841898
def coin_flip(L,p,N):
    
    '''
    L: flip coint L times 
    q: with p probability
    N: Repeat experiment N times
    ''' 
    coin       = binom(L, p)   # Binomial RV
    
    coin_flips = coin.rvs(N)   # Generate sample of N points
    coin_pmf   = coin.pmf(np.arange(L+1)) # Generate PMF from 0 to L 
    coin_cdf   = coin.cdf(np.arange(L+1))
    
    return coin_flips, coin_pmf, coin_cdf
fig, ax=plt.subplots(nrows=1, ncols=3,figsize=(12,3))

#Coin flip 1 time.
X1, P1, CP1 = coin_flip(20, 0.5, 80)

ax[0].plot(X1,'--s',color='purple')

ax[1].hist(X1,density=True)
ax[1].plot(P1,'-o')

ax[2].plot(CP1,'-d',color='red')
[<matplotlib.lines.Line2D at 0x7f15a4e81c50>]
../_images/02_RandVar_45_1.png

2.3. Continuous time random walk: Brownian motion

We have learned about random variables, random walk and have encorunetred a concept of stochastic process on the example of discrete step 1D random walk. Now let us generate the prototypical stochasic process in continuous time; the brownian motion. Brownian motion was first discoverd by a botanist Brown who noticed that a pollen in solution undergo erratic and incessant motion. To simulate brownian motion we take the continuous time limit of random walk and approximate dsiplacements of our particle as normally distributed (binomial->normal, time step->continuous time)

\[x(t+dt)-x(t)=N(0,\sqrt{2D dt})\]

We assume we have started at position \(\mu=0\) and our variance is given by \(\sigma^2=2Dt\) Where D is the diffusion coefficnets which is related to parameters of discree random walk as shown in the lecture.

\[x(t+dt)=x(t)+\sqrt{2D dt} \cdot N(0,1)\]

In the last step we re-wrote brownian motion equation in a convenient way by shifting normally distributed radnom variable by \(\mu\) and scaling it by \(\sigma\)

$\(N(\mu, \sigma^2)=\mu+\sigma N(0,1)\)$.

def brown(T, N=1, dim=1, dt=1, D=1):
    
    """
    Creates 2D brownian path given:
    time T 
    N=1 trajecotires
    dim=2: 2D by default
    dt=1 timestep
    D=1 diffusion coeff
    """
    
    nT = int(T/dt) # how many points to sample
    
    dR = np.sqrt(2*D*dt) * np.random.randn(N, nT, dim) # 3D position of brownian particle
    
    R = np.cumsum(dR, axis=1) # accumulated 3D position of brownian particle
    
    return R

2.3.1. Brownian motion in 1D

R=brown(1000, N=50,dim=1)
R.shape
(50, 1000, 1)
[plt.plot(R[i, :, :]) for i in range(3)];
../_images/02_RandVar_50_0.png

2.3.2. Brownian motion in 2D/3D

R=brown(1000,N=10,dim=2)

print(R.shape) 
print('')
print(len(R))
(10, 1000, 2)

10
R=brown(500, N=10, dim=2)

hv.extension('plotly')

curve = hv.Curve(R[0,:,:])

for i in range(1,len(R)):
    
    curve*=hv.Curve(R[i,:,:]) 

curve.opts(width=600, height=600)
R=brown(500, N=10, dim=3)

hv.extension('plotly')
hv.Path3D( R[i,:,:] for i in range(10) ).opts(width=800,height=800, line_width=3)

2.3.3. Statistics of Brownian motion

R = brown(1000, N=10000, dim=1)
R.shape
(10000, 1000, 1)
[plt.hist(R[:,i,:],   alpha=0.3,  density=True) for i in range(5)];
../_images/02_RandVar_57_0.png

2.3.4. Diffusion Equation

The movement of individual random walkers \(\leftrightarrow\) density of walkers \(\rho(\vec{r},t)\)

Diffusion equation:

Formulated empirically as Fick’s laws

\[\frac{\partial\rho}{\partial t} = \mathcal{D}\nabla^2\rho\]
  • This is a 2nd order PDE! Unlike equations of motion diff eq shows irreersibile behaviour

  • This one exactly solvable. But in general reaction-diffusion PDEs difficult to solve analytically.

  • Can solve numerically by writing derivatives as finite differences!

  • Can also simulate via random walk!

  • Diffusion coefficient \(D\), Units \([L^2]/[T]\)

Important special case solution (here written in 1d):

\[\rho(x,t) = \frac{1}{\sqrt{2\pi \sigma(t)^2}}\exp\left(-\frac{x^2}{2\sigma(t)^2}\right),\]

where \(\sigma(t)=\sqrt{2{D}t}\)

  • density remains Gaussian

  • Gaussian becomes wider with time

  • check that this is indeed a solution by plugging into the diffusion equation!

import ipywidgets as widgets

# time dependent sigma
def sigma(t, D = 1):
    return np.sqrt(2*D*t)

# density function is gaussian
def gaussian(x, t):
    return  1/np.sqrt(2*np.pi*sigma(t)**2) * np.exp(-x**2/(2*sigma(t)**2)) #

@widgets.interact(t=(0.001,1,0.001))
def diffusion(t=0.001):
    
    x = np.linspace(-4, 4, 1000)
    
    plt.plot(x, gaussian(x, 0.001), '--', color='orange')
    
    plt.plot(x, gaussian(x, t), lw=3, color='green')